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Abstract 

This paper presents an approach to verify safety properties of 
Eriang-style, higher-order concurrent programs automatically. In- 
spired by Core Erlang, we introduce A Actor, a prototypical func- 
tional language with pattern-matching algebraic data types, aug- 
mented with process creation and asynchronous message-passing 
primitives. We formalise an abstract model of Aactor programs 
called Actor Communicating System (ACS) which has a natural 
interpretation as a vector addition system, for which some verifi- 
cation problems are decidable. We give a parametric abstract inter- 
pretation framework for Aactor and use it to build a polytime com- 
putable, flow-based, abstract semantics of Aactor programs, which 
we then use to bootstrap the ACS construction, thus deriving a more 
accurate abstract model of the input program. We have constructed 
Soter, a tool implementation of the verification method, thereby 
obtaining the first fully-automatic, infinite-state model checker for 
a core fragment of Erlang. We find that in practice our abstrac- 
tion technique is accurate enough to verify an interesting range 
of safety properties. Though the ACS coverability problem is Ex- 
PSPACE-complete, Soter can analyse these verification problems 
surprisingly efficiently. 

Keywords Verification, Infinite- State Model Checking, Static 
Analysis, Petri Nets, Erlang 

1. Introduction 

This paper concerns the verification of concurrent programs written 
in Erlang. Originally designed to program fault-tolerant distributed 
systems at Ericsson in the late 80s, Erlang is now a widely used, 
open-sourced language with support for higher-order functions, 
concurrency, communication, distribution, on-the-fly code reload- 
ing, and multiple platforms [2, 3]. Largely because of a runtime 
system that offers highly efficient process creation and message- 
passing communication, Erlang is a natural fit for programming 
multicore CPUs, networked servers, parallel databases, GUIs, and 
monitoring, control and testing tools. 

The sequential part of Erlang is a higher order, dynamically 
typed, call-by- value functional language with pattern-matching al- 
gebraic data types. Following the actor model [1], a concurrent Er- 
lang computation consists of a dynamic network of processes that 
communicate by message passing. Every process has a unique pro- 
cess identifier (pid), and is equipped with an unbounded mailbox. 
Messages are sent asynchronously in the sense that send is non- 
blocking. Messages are retrieved from the mailbox, not FIFO, but 
First-In-First-Firable-Out (FIFFO) via pattern-matching. A process 
may block while waiting for a message that matches a certain pat- 
tern to arrive in its mailbox. For a quick and highly readable intro- 
duction to Erlang, see Armstrong's CACM article [2]. 

Challenges. Concurrent programs are hard to write. They are just 
as hard to verify. In the case of Erlang programs, the inherent 



complexity of the verification task can be seen from several diverse 
sources of infinity in the state space. 

(oo 1) General recursion requires a (process local) call-stack. 

(oo2) Higher-order functions are first-class values; closures can 
be passed as parameters or returned. 

(oo 3) Data domains, and hence the message space, are un- 
bounded: functions may return, and variables may be 
bound to, terms of an arbitrary size. 

(oo 4) An unbounded number of processes can be spawned dy- 
namically. 

(oo 5) Mailboxes have unbounded capacity. 

The challenge of verifying Erlang programs is that one must reason 
about the asynchronous communication of an unbounded set of 
messages, across an unbounded set of Turing-powerful processes. 

Our goal is to verify safety properties of Erlang-like programs 
automatically, using a combination of static analysis and infinite- 
state model checking. To a large extent, the key decision of which 
causes of infinity to model as accurately as possible and which 
to abstract is forced upon us: the class consisting of a fixed set 
of context-free (equivalently, first-order) processes, each equipped 
with a mailbox of size one and communicating messages from a 
finite set, is already Turing powerful [10]. Our strategy is thus to 
abstract (oo 1), (oo 2) and (oo 3), while seeking to analyse message- 
passing concurrency, assuming (oo4) and (oo5). 

We consider programs of Aactor, a prototypical functional lan- 
guage with actor-style concurrency. Aactor is essentially Core Er- 
lang [5] — the official intermediate representation of Erlang code, 
which exhibits in full the higher-order features of Erlang, with 
asynchronous message-passing concurrency and dynamic process 
creation. 

With decidable infinite-state model checking in mind, we intro- 
duce Actor Communicating System (ACS), which models the inter- 
action of an unbounded set of communicating processes. An ACS 
has a finite set of control states Q, a finite set of pid classes P, a fi- 
nite set of messages M, and a finite set of transition rules. An ACS 

transition rule has the shape l : q — >■ q', which means that a process 
of pid class l can transition from state q to state q' with (possible) 
communication side effect £, of which there are four kinds, namely, 
(i) the process makes an internal transition (ii) it extracts and reads 
a message m from its mailbox (iii) it sends a message m to a pro- 
cess of pid class t' (iv) it spawns a process of pid class i' . ACS 
models are infinite state: the mailbox of a process has unbounded 
capacity, and the number of processes in an ACS may grow arbi- 
trarily large. However the set of pid classes is fixed, and processes 
of the same pid class are not distinguishable. 

An ACS can be inteipreted naturally as a vector addition sys- 
tem (VAS), or equivalently Petri net, using counter abstraction. 
Recall that a VAS of dimension n is given by a set of n-long 
vectors of integers regarded as transition rules. A VAS defines a 



state transition graph whose states are just n-long vectors of non- 
negative integers. There is a transition from state v to state v' just if 
v' = v+r for some transition rule r. It is well-known that the deci- 
sion problems Coverability and LTL Model Checking for VAS are 
EXPSPACE-complete; Reachability is decidable but its complex- 
ity is open. We consider a particular counter abstraction of ACS, 
called VAS semantics, which models an ACS as a VAS of dimen- 
sion \P\ x (|Q| + \M\), distinguishing two kinds of counters. A 
counter named by a pair (i, q) counts the number of processes of 
pid class l that are currently in state q; a counter named by (t, m) 
counts the sum total of occurrences of a message m currently in 
the mailbox of p, where p ranges over processes of pid class l. Us- 
ing this abstraction, we can conservatively decide properties of the 
ACS using well-known decision procedures for VAS. 

Parametric, Flow-based Abstract Interpretation. The starting 
point of our verification pathway is the abstraction of the sources of 
infinity (oo 1), (oo 2) and (oo 3). Methods such as fc-CFA [32] can 
be used to abstract higher-order recursive functions to a finite-state 
system. Rather than 'baking in' each type of abstraction separately, 
we develop a general abstract interpretation framework which is 
parametric on a number of basic domains. In the style of Van Horn 
and Might [33], we devise a machine-based operational semantics 
of Aactor that uses store-allocated continuations. The advantage 
of such an indirection is that it enables the construction of a ma- 
chine semantics which is 'generated' from the basic domains of 
Time, Mailbox and Data. We show that there is a simple no- 
tion of sound abstraction of the basic domains whereby every such 
abstraction gives rise to a sound abstract semantics of Aactor pro- 
grams (Theorem 1). Further if a given sound abstraction of the 
basic domains is finite and the associated auxiliary operations are 
computable, then the derived abstract semantics is finite and com- 
putable. 

Generating an Actor Communicating System. We study the ab- 
stract semantics derived from a particular 0-CFA-like abstraction 
of the basic domains. However we do not use it to verify properties 
of Aactor programs directly, as it is too coarse an abstraction to be 
useful. Rather, we show that a sound ACS (Theorem 3) can be con- 
structed in polynomial time by bootstrapping from the 0-CFA-like 
abstract semantics. Further, the dimension of the resulting ACS is 
polynomial in the length of the input Aactor program. The idea 
is that the 0-CFA-like abstract (transition) semantics constitutes a 
sound but rough analysis of the control-flow of the program, which 
takes higher-order computation into account but communicating 
behaviour only minimally. The bootstrap construction consists in 
constraining these rough transitions with guards of the form 're- 
ceive a message of this type' or 'send a message of this type' or 
'spawn a process', thus resulting in a more accurate abstract model 
of the input A Actor program in the form of an ACS. 

Evaluation. To demonstrate the feasibility of our verification 
method, we have constructed a prototype implementation called 
Soter. Our empirical results show that the abstraction framework is 
accurate enough to verify an interesting range of safety properties 
of non-trivial Erlang programs. 

Outline. In Section 2 we define the syntax of Aactor and infor- 
mally explain its semantics with the help of an example program. In 
Section 3, we introduce Actor Communicating System and its VAS 
semantics. In Section 4 we present a machine-based operational se- 
mantics of Aactor. In Section 5 we develop a general abstract inter- 
pretation framework for Aactor programs, parametric on a number 
of basic domains. In Section 6, we use a particular instantiation 
of the abstract interpretation to bootstrap the ACS construction. In 
Section 7 we present the experimental results based on our tool im- 
plementation Soter, and discuss the limitations of our approach. 



Notation. We write A* for the set of finite sequences of elements 
of the set A, and e for the null sequence. Let a £ A and 1,1' £ 
A*, we overload '•' so that it means insertion at the top a ■ I, at 
the bottom I ■ a or concatenation 11'. We write U for the i-th 
element of The set of finite partial functions from A to B is 
denoted A B. Given f : A — 1 B we define f[a h> b] : = 
(\x. if (x — a) then b else f{x)) and write [] for the everywhere 
undefined function. 

2. A Prototypical Fragment of Erlang 

In this section we introduce Aactor, a prototypical untyped func- 
tional language with actor concurrency. Aactor is essentially 
single-node Core Erlang [5] — the official intermediate represen- 
tation of Erlang code — without built-in functions and fault-tolerant 
features. It exhibits in full the higher-order features of Erlang, with 
message-passing concurrency and dynamic process creation. 

Syntax The syntax of Aactor is defined as follows: 

e G Exp ::= x | c(ei,...,e n ) | e (ei, . . . , e n ) | fun 
letrec x\=fun 1 . ■ ■ ■ x n =fun n . in e 
case e of pat 1 — > e\\ . . . ; pat n — 5> e n end 
receive pat\ — > eij .. . ;pat n — > e n end 
send(ei,e2) | spawn(e) | selfQ 
fun ::= fun(a;i, . . . ,x n ) — > e 
pat ..— x | c(pat±, . . . ,pat n ) 

where c ranges over a finite set E of constructors which we 
consider fixed thorough out the paper. 

For ease of comparison we keep the syntax close to Core Erlang 
and use uncurried functions, delimiters, fun and end. We write '_' 
for an unnamed unbound variable; using symbols from E, we 
write n-tuples as {ei, . . . , e n }, the list constructors as cons [_| _] 
and the empty list as [] . Sequencing (ei , e-i) is a shorthand for 
(fun(_)— >>e2 ) (ei ) and we we omit brackets for nullary constructors. 
The character '%' marks the start of a line of comment. Variable 
names begin with an uppercase letter, except when bound by letrec . 
The free variables fv(e) of an expression are defined as usual. A 
Aactor program V is just a closed Aactor expression. 

Labels For ease of reference to program points, we associate a 
unique label to each sub-expression of a program. We write I : e to 
mean that I is the label associated with e, and we often omit the 
label altogether. Take a term I: (£q: eo(li: e\, . . . ,£ n : e n )), we 
define ^.arg^ := li and arity(^) := n. 

Semantics The semantics of Aactor is defined in Section 4, but 
we informally present a small-step reduction semantics here to give 
an intuition of its model of concurrency. The rewrite rules for the 
cases of function application and A-abstraction are the standard 
ones for call-by-value A-calculus; we write evaluation contexts as 
E[]. 

A state of the computation of a Aactor program is a set n of 
processes running in parallel. A process (e)^, identified by the pid 
i, evaluates an expression e with mailbox m holding the messages 
not yet consumed. Purely functional reductions with no side-effect 
take place in each process, independently interleaved. A spawn 
construct, spawn(fun()— >e), evaluates to a fresh pid i! (say), with 
the side-effect of the creation of a new process, (e)^ , with pid l: 

(£[spawn(fun(He)])^ || n — > {E[t']Ym II II H 

A send construct, send(t, v), evaluates to the message v with the 
side-effect of appending it to the mailbox of the receiver process i; 
thus send is non-blocking: 

(£[Md(i,«)])£ || ( e y m || n — > (E[v])i, || { e y m . v || n 



The evaluation of a receive construct, receive pi — > ei . . .p„ — > 
e n end, will block if the mailbox of the process in question contains 
no message that matches any of the patterns pi. Otherwise, the 
first message m that matches a pattern, say pi, is consumed by 
the process, and the computation continues with the evaluation 
of d. The pattern-matching variables in ei are bound by 6 to the 
corresponding matching subterms of the message m; if more than 
one pattern matches the message, then (only) the first in textual 
order is fired. 



(E [receive 



ei . 



■Pn 



e n end\y m 



n-m f || n 

{E\6 ei ])i. m , || n ; 



Note that message passing is not First-In-First-Out but rather 
First-In-First-Fireable Out (FIFFO): incoming messages are queued 
at the end of the mailbox but the message that matches a receive 
construct, and is subsequently extracted, is not necessarily the first 
in the queue. 

Example 1 (Locked Resource). Figure 1 shows an example 
Aactor program. The code has three logical parts, which would 
constitute three modules in Erlang. The first part defines an Erlang 
behaviour 1 that governs the lock-controlled, concurrent access of 
a shared resource by a number of clients. A resource is viewed 
as a function implementing a protocol that reacts to requests; the 
function is called only when the lock is acquired. Note the use of 
higher-order arguments and return values. The function res_start 
creates a new process that runs an unlocked ( resjree ) instance 
of the resource. When unlocked, a resource waits for a {lock, P} 
message to arrive from a client P. Upon receipt of such a mes- 
sage, an acknowledgement message is sent back to the client and 
the control is yielded to resjocked . When locked (by a client P), 
a resource can accept requests {req, P, Cmd} from P — and from P 
only — for an unspecified command Cmd to be executed. 

After running the requested command, the resource is expected 
to return the updated resource handler and an answer, which may 
be the atom ok, which requires no additional action, or a couple 
{ reply , Ans} which signals that the answer Ans should be sent 
back to the client. When an unlock message is received from P 
the control is given back to res Jree . Note that the mailbox match- 
ing mechanism allows multiple locks and requests to be sent asyn- 
chronously to the mailbox of the locked resource without causing 
conflicts: the pattern matching in the locked state ensures that all 
the pending lock requests get delayed for later consumption once 
the resource gets unlocked. The functions resjock, res.unlock, 
res.request , res.do encapsulate the locking protocol, hiding it from 
the user who can then use this API as if it was purely functional. 

The second part implements a simple 'shared memory cell' 
resource that holds a natural number, which is encoded using the 
constructors zero and {succ , _}, and allows a client to read its value 
(the command read) or overwrite it with a new one (the {write , X} 
command). Without locks, a shared resource with such a protocol 
easily leads to race conditions. 

The last part defines the function inc which accesses a locked 
cell to increment its value. The function add_to_cell adds M to 
the contents of the cell by spawning M processes incrementing 
it concurrently. Finally the entry-point of the program sets up a 
process with a shared locked cell and then calls add_to_cell . Note 
that N is a free variable; to make the example a program we can 
either close it by setting N to a constant or make it range over all 
natural numbers with the extension described in Section 5. 



letrec 

%%% LOCKED RESOURCE MODULE 

res_start = fun(Res) — > spawn(fun() — ► resjree (Res)). 
res_free = fun(Res) — > 
receive {lock, P} — > 

send(P, {acquired, self ()}), resjocked (Res, P) 

end. 

resjocked = fun(Res, P) — y 
receive 

{req, P, Cmd} -> 

case Res(P, Cmd) of 
{NewRes, ok} — > 

resjocked (NewRes, P); 
{NewRes, {reply, A}} -> 

send(P, {ans, self () , A}), 
resjocked (NewRes, P) 

end; 

{unlock, P} — > resjree (Res) 

end. 

% Locked Resource API 

resjock = fun(Q) ->send(Q, {lock, self ()}), 

receive {acquired, Q} — > ok end. 
res.unlock = fun(Q) — >send(Q, {unlock, self ()}). 
res_request = fun(Q, Cmd) — > 

send(Q, {req, self () , Cmd}), 

receive {ans, Q, X} — y X end. 
res_do = fun(Q, Cmd) — >send(Q, {req, self () , Cmd}). 

%%% CELL IMPLEMENTATION MODULE 
celLstart = fun() — y res_start (cell (zero)), 
cell = fun(X) -> 

fun(_P, Cmd)-> 
case Cmd of 

{write, Y} {cell (Y), ok}; 
read -y {cell (X), {reply , X}} 

end. 

% Cell API 

celljock = fun(C) — > resjock (C). 
celLunlock = fun(C) — > res_unlock (C). 
celLread = fun(C) — > res_request (C, read). 
celLwrite = fun(C, X) — > res.do (C, {write, X}). 

%%% INCREMENT CLIENT 

inc = fun(C) — !> celljock (C), 

celLwrite (C, {succ, celLread (C)}), 
celLunlock (C). 
add.to.cell = fun(M, C) -> 

case M of zero — > ok; 

{succ, M'} — > spawn(fun() — y inc (C)), 
add.to.cell (M\ C) 



end. 

%%% ENTRY POINT 

in C = celLstart () , add_to_cell 



(N, C). 



1 I.e. a module implementing a general purpose protocol, parametrised over 
another module containing the code specific to a particular instance. 



Figure 1. Locked Resource (running example) 



An interesting correctness property of this code is the mutual 
exclusion of the lock-protected region (i.e. line 47) of the concur- 
rent instances of inc. 

Remark 1. The following Core Erlang features are not captured by 
Aactor. (i) Module system, exception handling, arithmetic primi- 
tives, built-in data types and I/O can be straightforwardly translated 
or integrated into our framework. They are not treated here be- 
cause they are tied to the inner workings of the Erlang runtime sys- 
tem, (ii) Timeouts in receives, registered processes and type guards 
can be supported using suitable abstractions, (iii) A proper treat- 



ment of monitor / link primitives and the multi-node semantics will 
require a major extension of the concrete (and abstract) semantics. 

3. Actor Communicating Systems 

In this section we explore the design space of abstract models of 
Erlang-style concurrency. We seek a model of computation that 
should capture the core concurrency and asynchronous communi- 
cation features of Aactor and yet enjoys the decidability of interest- 
ing verification problems. In the presence of pattern-matching alge- 
braic data types, the (sequential) functional fragment of Aactor is 
already Turing powerful [28]. Restricting it to a pushdown (equiv- 
alently, first-order) fragment but allowing concurrent execution 
would enable, using very primitive synchronization, the simulation 
of a Turing-powerful finite automaton with two stacks. A single 
finite-control process equipped with a mailbox (required for asyn- 
chronous communication) can encode a Turing-powerful queue 
automaton in the sense of Minsky. Thus constrained, we opt for 
a model of concurrent computation that has finite control, a finite 
number of messages, and a finite number of process classes. 

Definition 1. An Actor Communicating System (ACS) A is a tuple 
(P, Q, M, R, lo, qo) where P is a finite set of pid-classes, Q is a 
finite set of control-states, M is a finite set of messages, Lo £ Pis 
the pid-class of the initial process, qo £ Q is the initial state of the 

initial process, and R is a finite set of rules of the form i: q —¥ q' 
where i £ P, q, q £ Q and i is a label that can take one of four 
possible forms: 

- r, which represents an internal (sequential) transition of a pro- 
cess of pid-class l 

- ?m with m £ M: a process of pid-class i extracts (and reads) 
a message m from its mailbox 

- i'!m with t' £ P, m £ M: a process of pid-class i sends a 
message m to a process of pid-class i 1 

- vi . q" with l £ P and q" £ Q: a process of pid-class l spawns 
a new process of pid-class i that starts executing from q" 

Now we have to give ACS a semantics, but interpreting the ACS 
mailboxes as FIFFO queues would yield a Turing-powerful model. 
Our solution is to apply a counter abstraction on mailboxes: dis- 
regard the ordering of messages, but track the number of occur- 
rences of every message in a mailbox. Since we bound the number 
of pid-classes, but wish to model dynamic (and hence unbounded) 
spawning of processes, we apply a second counter abstraction on 
the control states of each pid-class: we count, for each control-state 
of each pid-class, the number of processes in that pid-class that are 
currently in that state. 

It is important to make sure that such an abstraction contains 
all the behaviours of the semantics that uses FIFFO mailboxes: 
if there is a term in the mailbox that matches a pattern, then the 
corresponding branch is non-deterministically fired. To see the 
difference, take the ACS that has one process (named i), three 

control states q, qi and qi, and two rules l : q qi, l: q — ^> q2- 
When equipped with a FIFFO mailbox containing the sequence 
cab, the process can only evolve from q to qi by consuming a from 
the mailbox, since it can skip c but will find a matching message 
(and thus not look further into the mailbox) before reaching the 
message b. In contrast, the VAS semantics would let q evolve non- 
deterministically to both qi and 52, consuming a or 6 respectively: 
the mailbox is abstracted to [a 1— > 1,6 n- 1, c i— > 1] with no 
information on whether a or 6 arrived first. However, the abstracted 
semantics does contain the traces of the FIFFO semantics. 

The VAS semantics of an ACS is a state transition system 
equipped with counters (with values in N) that support increment 
and decrement (when non-zero) operations. Such infinite-state sys- 



tems are known as vector addition systems (VAS), which are equiv- 
alent to Petri nets. 

Definition 2 (Vector Addition System), (i) A vector addition sys- 
tem (VAS) V is a pair (/, R) where / is a finite set of indices (called 
the places of the VAS) and R C Z 7 is a finite set of rules. Thus a 
rule is just a vector of integers of dimension |/|, whose components 
are indexed (i.e. named) by the elements of I. 
(ii) The state transition system [V] induced by a VAS V = (/, R) 
has state-set N 7 and transition relation 

{(v, v + r) I v £ N 1 , r £ R, v + r £ N 7 }. 

We write v < v' just if for all i in I, v(i) < v'(i). 

The semantics of an ACS can now be given easily in terms of a 
corresponding underlying vector addition system: 

Definition 3 (VAS semantics). The semantics of an ACS A = 
(P, Q, M, R, to, qo) is the transition system induced by the VAS 
V = (I, R) where I = P x (Q WM) and R = {r | r € R}). The 
transformation r 1— > r is defined as follows. 2 



ACS Rules: r 




VAS Rules: r 


T 1 

l: q-> q 


[0,9) H> 


-1, 0,9'; 


1 H-> 1] 


?m / 

1: q ► q 


[0,9) >-» 


-1, 0,9'; 


) 1 — y 1, 0, m) >-¥ — 1] 


l: q > q 


[0,9) 


-1, 0,9'! 


) 1 y 1,0', m) i-> 1] 


vi'.q" 1 
l: q > q 


[0,9) ^ 


-1, 0>9 ; ! 


)^1,0',9")^1] 



Given a [V] -state v £ N 7 , the component v(t, q) counts the 
number of processes in the pid-class 1 currently in state q, while 
the component v(t, m) is the sum of the number of occurrences of 
the message m in the mailboxes of the processes of the pid-class 1. 



While infinite-state, many non-trivial properties are decidable 
on VAS including reachability, coverability and place boundedness; 
for more details see [13]. In this paper we focus on coverability, 
which is ExPSPACE-complete [30]: given two states s and t, is it 
possible to reach from s a state t' that covers t (i.e. t' < t)l 

Which kinds of correctness properties of Aactor programs can 
one specify by coverability of an ACS? We will be using ACS to 
over-approximate the semantics of a Aactor program, so if a state 
of the ACS is not coverable, then it is not reachable in any execu- 
tion of the program. It follows that we can use coverability to ex- 
press safety properties such as: (i) unreachability of error program 
locations (ii) mutual exclusion (iii) boundedness of mailboxes: is it 
possible to reach a state where the mailbox of pid-class 1 has more 
than k messages? If not we can allocate just k memory cells for 
that mailbox. 

4. An Operational Semantics for Aactor 

In this section, we define an operational semantics for Aactor 
using a time-stamped CESK* machine, following a methodology 
advocated by Van Horn and Might [33]. An unusual feature of 
such machines are store-allocated continuations which allow the 
recursion in a programs's control flow and data structure to be 
separated from the recursive structure in its state space. As we shall 
illustrate in Section 5, such a formalism is key to a transparently 
sound and parametric abstract interpretation. 

A Concrete Machine Semantics. Without loss of generality, we 
assume that in a Aactor program, variables are distinct, and con- 
structors and cases are only applied to (bound) variables. The 

2 All unspecified components of the vectors r as defined in the table are set 
to zero. 



Aactor machine defines a transition system on (global) states, 
which are elements of the set State 

s G State :— Procs X Mailboxes X Store 

n G Procs :— Pid — 1 ProcState 

p G Mailboxes := Pid Mailbox 

An element of Procs associates a process with its (local) state, and 
an element of Mailboxes associates a process with its mailbox. We 
split the Store into two partitions 

o G Store := ( VAddr Value) x (KAddr Kont) 

each with its address space, to separate values and continuations. 
By abuse of notation o(x) shall mean the application of the first 
component when x G VAddr and of the second when x G KAddr. 
The local state of a process 

q G ProcState := (ProgLoc t+J Pid) x Env x KAddr x Time 

is a tuple, consisting of (i) a pid, or a program location^ which 
is a subterm of the program, labelled with its occurrence; when- 
ever it is clear from the context, we shall omit the label; (ii) an 
environment, which is a map from variables to pointers to values 
p G Env := Var — VAddr; (iii) a pointer to a continuation, 
which indicates what to evaluate next when the current evaluation 
returns a value; (iv) a time-stamp, which will be described later. 
Values are either closures or pids: 

d G Value := Closure l±) Pid Closure := ProgLoc x Env 

Note that, as defined, closures include both functions (which is 
standard) as well as constructor terms. 

All the domains we define are naturally partially ordered: 
ProgLoc and Var are discrete partial orders, all the others are 
defined by the appropriate pointwise extensions. 

Mailbox and Message Passing A mailbox is just a finite sequence 
of values: m G Mailbox := Value* . We denote the empty mailbox 
by e. A mailbox is supported by two operations: 

mmatch: pat* x Mailbox x Env x Store — > 

(N x (Var — >■ Value) x Mailbox) ± 

enq: Value x Mailbox — » Mailbox 

The function mmatch takes a list of patterns, a mailbox, the current 
environment and a store (for resolving pointers in the values stored 
in the mailbox) and returns the index of the matching pattern, a 
substitution witnessing the match, and the mailbox resulting from 
the extraction of the matched message. To model Erlang-style 
FIFFO mailboxes we set enq(d, m) := m ■ d and define: 

mmatch(pi . . .p n ,m, p, o) := (i, 6>,mi ■ 1112) 

such that 

m = mi ■ d ■ m-2 Vd' G mi . Vj . match Pj(T (pj, d') — _L 

6 = match p , CT (pi, d) Vj < i . match Pi0 -(pj, d) — _L 

where match PjCT (p, d) seeks to match the term d against the pattern 
p, following the pointers p to the store o if necessary, and returning 
the witnessing substitution if matchable, and _L otherwise. 

Evaluation Contexts as Continuations. Next we represent (in an 
inside-out manner) evaluation contexts as continuations. A contin- 
uation consists of a tag indicating the shape of the evaluation con- 
text, a pointer to a continuation representing the enclosing evalua- 
tion context, and, in some cases, a program location and an envi- 
ronment. Thus k G Kont consists of the following constructs: 



3 Precisely a program location is a node in the abstract syntax tree of the 
program being analysed. 



- Stop represents the empty context. 

- Argi{£, vq . . . Vi-i, p, a) represents the context 

E[v (vi,. . .,Vi-i, [],e' i+1 , . . . ,e'„)] 

where eo (ei , . . . , e„ ) is the subterm located all; p closes the terms 
ej+i, . . . , e„ to e' i+1 , . . . ,e' n respectively; the address a points to 
the continuation representing the enclosing evaluation context E. 

Addresses, Pids and Time-Stamps. While the machine supports 
arbitrary concrete representations of time-stamps, addresses and 
pids, we present here an instance based on contours [32] which 
shall serve as the reference semantics of Aactor, and the basis for 
the abstraction of Section 5. 

A way to represent a dynamic occurrence of a symbol is the 
history of the computation at the point of its creation. We record 
history as contours which are strings of program locations 

t G Time := ProgLoc* 

The initial contour is just the empty sequence to := e, while 
the tick function updates the contour of the process in question 
by prepending the current program location, which is always a 
function call (see rule Apply): 

tick: ProgLoc x Time — > Time tick(£, t) := £ ■ t 

Addresses for values (6 G VAddr) are represented by tuples 
comprising the current pid, the variable in question, the bound 
value and the current time stamp. Addresses for continuations 
(a, c G KAddr) are represented by tuples comprising the cur- 
rent pid, program location, environment and time (i.e. contour); or 
* which is the address of the initial continuation (Stop). 

VAddr := Pid x Var x Data x Time 

KAddr := (Pid x ProgLoc x Env x Time) W {*} 

The data domain (8 G Data) is the set of closed Aactor terms; the 
function res : Store x Value — > Data resolves all the pointers of a 
value through the store a, returning the corresponding closed term: 

res(cr, l) :— 1 

res(cr, (e,p)) := e[x \-} res(cr, a(p(x))) \ x G fv(e)] 

New addresses are allocated by extracting the relevant compo- 
nents from the context at that point: 

newkpush : Pid x ProcState — > KAddr 

newkpush(t, {£, p, -, t}) := (t, larg , p,t) 

newkpop : Pid x Kont x ProcState — >■ KAddr 

new kp op(t, k, (_,-,_,*)) := (4, larg i+1 , p,t) 

where « = Arg 4 (£,..., p, _) 

new va : Pid x Var x Data x ProcState — > VAddr 

new, a (t, x, 5, (_, _, _, t)) := (t, x, S, t) 

Remark 2. To enable data abstraction in our framework, the address 
of a value contains the data to which the variable is bound: by 
making appropriate use of the embedded information in the abstract 
semantics, we can fine-tune the data-sensitivity of our analysis, as 
we shall illustrate in Section 5. However when no data abstraction 
is intended, this data component can safely be discarded. 

Following the same scheme, pids (t G Pid) can be identi- 
fied with the contour of the spawn that generated them: Pid := 
(ProgLoc x Time). Thus the generation of a new pid is defined as 

newpid : Pid x ProgLoc x Time — > Pid 

new pli ((£',t'),£,t) := (£,tick* (t,tick(£' ,t')) 

where tick* is just the simple extension of tick that prepends a 
whole sequence to another. Note that the new pid contains the pid 



that created it as a sub-sequence: it is indeed part of its history 
(dynamic context). The pid to : = (£o, e) is the pid associated with 
the starting process, where to is just the root of the program. 

Remark 3. (i) Note that the only sources of infinity for the state 
space are time, mailboxes and the data component of value ad- 
dresses. If these domains are finite then the state space is finite and 
hence reachability is decidable. (ii) It is possible to present a more 
general version of the concrete machine semantics. We can reor- 
ganise the machine semantics so that components such as Time, 
Pid, Mailbox, KAddr and VAddr are presented as parameters 
(which may be instantiated as the situation requires). In this paper 
we present a contour-based machine, which is general enough to 
illustrate our method of verification. 

Definition 4 (Concrete Semantics). Now that the state space is 
set up, we define a (non-deterministic) transition relation on states 
{—¥) C State x State. In Figure 2 we present the rules for 
application, message passing and process creation; we omit the 
other rules (letrec, case and treatment of pids as returned value) 
since they follow the same shape. The transition s — > s' is defined 
by a case analysis of the shape of s. 

The rules for the purely functional reductions are a simple lift- 
ing of the corresponding rules for the sequential CESK* machine: 
when the currently selected process is evaluating a variable Vars 
its address is looked up in the environment and the corresponding 
value is fetched from the store and returned. Apply: When evalu- 
ating an application, control is given to each argument — including 
the function to be applied — in turn; FunEval and ArgEval are then 
applied, collecting the values in the continuation. After all argu- 
ments have been evaluated, new values are recorded in the environ- 
ment (and the store), and control is given to the body of the func- 
tion to be applied. The rule Receive can only fire if mmatch returns 
a valid match from the mailbox of the process. In case there is a 
match, control is passed to the expression in the matching clause, 
and the substitution 6 witnessing the match is used to generate the 
bindings for the variables of the pattern. When applying a send 
Send, the recipient's pid is first extracted from the continuation, and 
enq is then called to dispatch the evaluated message to the desig- 
nated mailbox. When applying a spawn Spawn, the argument must 
be an evaluated miliary function; a new process with a fresh pid is 
then created whose code is the body of the function. 

One can easily add rules for run- time errors such as wrong arity 
in function application, non-exhaustive patterns in cases, sending 
to a non-pid and spawning a non-function. 

5. Parametric Abstract Interpretation 

We aim to abstract the concrete operational semantics of Section 4 
isolating the least set of domains that need to be made finite in order 
for the abstraction to be decidable. We then state the conditions on 
these abstract domains that are sufficient for soundness. 

In Remark 3 we identify Time, Mailbox and Data as responsi- 
ble for the unboundedness of the state space. Our abstract semantics 
is thus parametric on the abstraction of these basic domains. 

Definition 5 (Basic domains abstraction), (i) A data abstraction 
is a triple T> = (Data, ay, res) where Data is a flat (i.e. discretely 
ordered) domain of abstract data values, ay : Data — Y Data and 
res: Store x Value — > &(Data). 

(ii) A time abstraction is a tuple T = { Time, a t , tick, to) where 
Time is a flat domain of abstract contours, a t : Time — > Time, 
to G Time, and tick: ProgLoc x Time — > Time. 

(iii) A mailbox abstraction is a tuple M — {Mailbox, < m , 
Ll m , ctm, enq,?, mmatch) where (Mailbox, < m , U ra ) is a join- 



semilattice with least element e G Mailbox, a m : Mailbox — ► 
Mailbox and enq: Value x Mailbox ~ > Mailbox are monotone 
in mailboxes. 

mmatch: pat* x Mailbox x Env x Store — > 

0>(N x (Var Value) x Mailbox) 

(iv) A basic domains abstraction is a triple I = (D, T ' , M) 
consisting of a data, a time and a mailbox abstraction. 

An abstract interpretation of the basic domains determines an 
interpretation of the other abstract domains as follows. 



State 


:— Procs x Mailboxes x Store 


Procs 


:=Pid^f ^(ProcState) 


ProcState 


:— (ProgLoc tbl Pid) x Env x KAddr x Time 


Store 


:= (VAddr &>( Value)) x {KAddr ->■ ^(Kont)) 


Mailboxes 


:= Pid — > Mailbox Value := Closure l+J Pid 


Closure 


.— ProgLoc x Env Env := Var — k VAddr 


Pid 


:= (ProgLoc x Time) td {To} to '■= to 



each equipped with an abstraction function defined by an appropri- 
ate pointwise extension. We will call all of them a since it will not 
introduce ambiguities. The abstract domain Kont is the pointwise 
abstraction of Kont, and we will use the same tags as those in the 
concrete domain. The abstract functions newish, newk pop , new va 
and newpid, are defined exactly as their concrete versions, but on 
the abstract domains. 

When B is a flat domain, the abstraction of a partial map 
C = A^ B to C = A-+ &>{B) is defined as 

ac(f) := A3 6 A. {«b(6) | (a, b) G / and ola (a) — a} 

where the preorder on C is / <g g <S> Va. f(a) C g(a). 

The operations on the parameter domains need to 'behave' with 
respect to the abstraction functions: the standard correctness condi- 
tions listed below must be satisfied by their instances. These con- 
ditions amount to requiring that what we get from an application 
of a concrete auxiliary function is adequately represented by the 
abstract result of the application of the abstract counterpart of that 
auxiliary function. The partial orders on the domains are standard 
pointwise extensions of partial orders of the parameter domains. 

Definition 6 (Sound basic domains abstraction). A basic domains 
abstraction X is sound just if the following conditions are met by 
the auxiliary operations: 

a t (tick(£, t)) <tick(£, a t (t)) (1) 

<r < ct' A d < £? => fes(<r, d) < fes(<r', (?) (2) 

Vcr > a(a). ay(res(o", d)) G fes(er, a(d)) (3) 

a m (enq(d,m)) < enq(a(d), a m (m)) a m (e) = e (4) 

if mmatch(p, m, p, a) — (i,9,m') then Vfri > a(m), Va > 
a(cr), 3m' > a(m') such that 

(i, a(8),m) G mmatch (p, m,a(p),a) (5) 

Following the Abstract Interpretation framework, one can ex- 
ploit the soundness constraints to derive, by algebraic manipula- 
tion, the definitions of the abstract auxiliary functions which would 
then be correct by construction [24]. 

Definition 7 (Abstract Semantics). Once the abstract domains 
are fixed, the rules that define the abstract transition relation are 



Functional reductions 



Vars 



Process creation 



FunEval 

if tt(l) = {£: (eo(ei, . . . , e n )),p, a, t) 

b : = new kpush (t, 7r(i)) 
then 7r' = 7r[t i— > (eo, p, b, t)] 

<t' = CT [6 h-s- Argo<^, e, p, a}] 

ArgEval 

if 7r(t) = (v, p, a, t) 
a- (a) = K = Argi{£, do . . . d i ^ 1 ,p',c) 
di := (v,p) 
b := new kpop (t, K, 7r(t)) 
then 7r' = 7r[t i— > (£.aig i+1 , p', b, t)] 

cr' = cr[b >-> Arg i+ i(£,d . ..d it p',c)] 

Apply 

if 7r(t) = (v, p, a, t), arity(£) = n 
a(a) = k = Arg n (£, d . . . d„_i,p', c) 

do = (fun(a:i . . . a;„) ->• e, po) d n := (v, p) 

6j := new v a(t, li, res(<j, dj), 7r(i.)) 

f := tick(£,7r((.)) 
then 7r' = 7r[t i— > (e, p'[zi — > bi . . . x n — > b n ] , c. t')] 

cr' = (7 [61 i-> di . . . 6 n d n ] 



if 7r(t) = (x, p, a, t) 
<r(p(x)) = («,/>') 
then 7r' = tt[l i-> (u, p', a, t)] 



Communication 



Receive 

if 7r(t) = (receive pi — > ei . . . p„ — > e n end, p, a, t) 
mmatch(pi . . . p n , p(t), p, cr) = (i, 6, m) 
6 = [xi 1 — y d\ . . . 1 — y dft] 
fej := new V a(t, Xj, res(ir, d 3 ), 7r(t)) 
p' := p[xi h> bi . . . xk i-> bfe] 
then 7r' = 7r[t 1— > (ej, p', a, <)] 
p' = p[t h> m] 
cr' = a[bi 1 ^ di . . . bfc 1 ^ dfc] 

Send 

if 7r(i) = (i), p, a, t) 
cr(a) = k = Arg2 (£, d, t' , c) 
d = (send, _) 
then n' = tt[l >—> (v, p, c, t}] 

p' = p[t' i-J- enq((«, p), p(t'))] 



Spawn 

if 7r(t) = (fun() — > e, p, a, t) 
<r(a) = Argi (£, d,p',c) 
d = (spawn, _) 
:= new p i()(t, 

then 

t (t',p', C, t), 

■ >->• (e, p, *,to) 
p' = p[t' M> e] 
Self 

if 7r(t) = ( self (), p, a, i) 
then 7r' = vr[t i— > (t, p, a, t)] 

Initial state 



Ink The initial state associated with 
a program V is s v := (ttq, po, ctq) 
where 7r = [to n> {V, Q,*,to)] 

PO = [to i-> e] 

ctq = [* (-> Stop] 



Figure 2. Operational Semantics Rules. The tables define the transition relation s = {n, n, a, 1?) — > {n' , pL , a' , ■£)') = s' by cases; the 
primed components of the state are identical to the non-primed components, unless indicated otherwise in the "then" part of the rule. The 
meta- variable v stands for terms that cannot be further rewritten such as A-abstractions, constructor applications and un-applied primitives. 



straightforward abstractions of the original ones. In Figure 3, we 
present the abstract counterparts of the rules for the operational 
semantics in Figure 2, defining the non-deterministic abstract tran- 
sition relation on abstract states (*•») C State x State. When 
referring to a particular program V, the abstract semantics is the 
portion of the graph reachable from s-p. 

Theorem 1 (Soundness of Analysis). Given a sound abstraction 
of the basic domains, if s — >■ s' and a c f a (s) < u, then there exists 
u £ State such that a c f a (s') < u andu ~> u . 

See Appendix B for a proof of the Theorem. 

Now that we have defined a sound abstract semantics we give 
sufficient conditions for its computability. 

Theorem 2 (Decidability of Analysis). If a given (sound) abstrac- 
tion of the basic domains is finite, then the derived abstract transi- 
tion relation defined in Figure 3 is finite; it is also decidable if the 
associated auxiliary operations ( in Definition 6) are computable. 

Proof. The proof is by a simple inspection of the rules: all the 
individual rules are decidable and the state space is finite. □ 

A Simple Mailbox Abstraction Abstract mailboxes need to be 
finite too in order for the analysis to be computable. By abstracting 
addresses (and data) to a finite set, values, and thus messages, 
become finite too. The only unbounded dimension of a mailbox 
becomes then the length of the sequence of messages. We then 
abstract mailboxes by losing information about the sequence and 
collecting all the incoming messages in an un-ordered set: 

Msxt ■= Value), C, U, a xt , enq^, 0, mmatch set } 

where the abstract version of enq is the insertion in the set, as easily 
derived from the soundness requirement; the matching function 
is similarly derived from the correctness condition: writing p = 

Pi--- Pn 



a S et(m) := {a(d) Bi. m; = d} 
mmatch sc t(p, m, jo, a) := < (i,8,m) 



enq sct (d, tit) := {d} U fit 
d £ in, 

8 G matches (j>j, d) 



We omit the straightforward proof that this constitutes a sound 
abstraction. 

Abstracting Data. We included data in the value addresses in the 
definition of VAddr, cutting contours would have been sufficient 
to make this domain finite. A simple solution is to discard the 
value completely by using the trivial data abstraction Datao : = 
{_} which is sound. If more precision is needed, any finite data- 
abstraction would do: the analysis would then be able to distinguish 
states that differ only because of different bindings in their frame. 

We present here a data abstraction particularly well-suited to 
languages with algebraic data-types such as Aactor: the abstraction 
\_e\~ D discards every sub-term of e that is nested at a deeper level 
than a parameter D. 



L(e, P) J s ,o == {-} L(fun. . . , p)J ~ D+1 := {_} 

di e d(p(xi)), 



L(c(xi . . . x n ),p)j SD+1 := i c(<5i ...S n ) 



Si e [di\ g D 



where _ is a placeholder for discarded subterms. 

An analogous D-deep abstraction can be easily defined for con- 
crete values and we use the same notation for both; we use the 
notation [<5J D for the analogous function on elements of Data. 

We define T>d = (Datao, ctD, fesrj>) to be the 'depth-D' data 
abstraction where 

Datao+i '■= {-} U {c(Si . . . S n ) \ Si G Datao} 
Od(<5) := [S\ D fesz,(CT,d) := \_d\ 3 D 

The proof of its soundness is easy and we omit it. 

Abstracting Time. Let us now define a specific time abstraction 
that amounts to a concurrent version of a standard fc-CFA. A fc-CFA 
is an analysis parametric in k, which is able to distinguish dynamic 
contexts up to the bound given by k. We proceed as in standard 
fc-CFA by truncating contours at length fc to obtain their abstract 



Functional abstract reductions 



AbsVars 



Abstract process creation 



AbsFunEval 

if 7r(T) 3g = (<: (eo(ei, . . . , e„)) , p, 3, t > 

6 := ncwi^ sh (t, q) 
then 7r ' = 7? U [T i-> {{eo,p, b, t )}] 
5' = a U [6 {Arg (£, e, p, a}}] 

AbsArgEval 

if n(T) 9 (u, p, a, t ) 
9(a) 9 re = Argi(£, d . . . d i - 1 ,p',c) 
d^ := (v, p) 
b := nSw^o p (L,K,q) 
then tF ' = 7r U [Ti-> {(£.arg i+1 , p 7 , ft, t )}] 

y = g U [6 ^ {Arg 1+1 {£, dp . . . d it p 1 , c)}] 

AbsApply 

if 7r(t) 9 g = (v,p,a,t), arity(^) = n 
<r(a) 3 Arg n (l,do ■ . .d„_i,p',c) 
cfo = (fun(x^. . -s> e,p ) d n := (v,p) 
5i £ fesp, d^, ) 
f>i := new^(t, Xj,5i,§") 

p" := pf[x\ l-> 6l . . . in f— >-J)n] 

then??' = 5?U {{e, p", c , tick(«, ?))}] 
5' = a U [bi i-> {di} . . . b n {d n }] 



if 5f ("t) 9 (x, p,a,t) 

thenvr' = tt U {(u, p 7 , a,?)}] 



Abstract communication 



AbsReceive 

if tt(T) 9 g = (e, p, a, t ) 

e = receive pi — > e\ . . . p n — > e„ end 

mmatch(pi . . . p n , p(T), p, <r) 9 (i, 8, m) 

= [xi i — y d\ . . . x k i — y dk] 

Sj G fes(CT, dj ) 

bj := 5ew^a(T, Xj , Sj ,q) 

p 1 := p[xi M> fti . . . x k i-j- fefc] 
then 5F' = 7? U [Ti-> {(e i ,p',a,t )}] 
p' = — s- m] 

5' = CT U [Si H> {di}. . . ftfc h-> {Sfc}] 

AbsSend 

if 7r(t) 9 (t>, p, a, i ) 
?(o|3 Arg 2 (4 
a! = (send, _) 
then 7?' = 7? U [?h» {(«, p, c, t )}] 

= ^[f h-> enq((u, p), £(7))] 



AbsSpawn 

if n(T) 9 (fun() — > e, p, 2, t } 
S=(o) 9 Ar gl {i,d,P ,-S) 
d = (spawn, _) 



then 



TT' = 7TU 

«' = p U 



in {(? ,-p 1 ,c,t)}, 
t i ^ {(e,p, *,?o>} 



AbsSelf 

if5f(t) 9 (self (),p,a,?> 

then tt' = tt U [Ti-> {{T, p, a, ?)}] 

Initial abstract state 

Abslnit The initial state associated 
with a program V is 

where tt = pb ^ {(P, [},*,t )}] 
PO = Pb e ] 
5 = [* h> {Stop}] 



Figure 3. Rules defining the Abstract Semantics. The tables describe the conditions under which a transition s = (7?, /2, ct) -w (7F', ft,' , 
a) = can fire; the primed versions of the components of the states are identical to the non-primed ones unless indicated otherwise in the 
"then" part of the corresponding rule. We write U for the join operation of the appropriate domain. 



counterparts: 

Time* := Uo<i<* P™gLoc l a t fc {h . . . £ k ■ t) := 4 . . . 4 

The simplest analysis we can then define is the one induced by 
the basic domains abstraction (Datao, Timeo, Mailbox^) . With 
this instantiation many of the domains collapse in to singletons. 
Implementing the analysis as it is would lead however to an expo- 
nential algorithm because it would record separate store and mail- 
boxes for each abstract state. To get a better complexity bound, we 
apply a widening following the lines of [33, Section 7]: instead 
of keeping a separate store and separate mailboxes for each state 
we can join them keeping just a global copy of each. This reduces 
significantly the space we need to explore: the algorithm becomes 
polynomial time in the size of the program (which is reflected in 
the size of ProgLoc). 

Considering other abstractions for the basic domains easily 
leads to exponential algorithms; in particular, the state- space grows 
linearly wrt the size of abstract data so the complexity of the anal- 
ysis using Data d is exponential in D. 

Dealing with open programs. Often it is useful to verify an open 
expression where its input is taken from a regular set of terms 
(see [28]). We can reproduce this in our setting by introducing a 
new primitive choice that non-deterministically calls one of its argu- 
ments. For instance, an interesting way of closing N in Example 1 
would be by binding it to any_num(): 

letrec . . . 

any_num() = choice(fun() — > zero, 

fun() — > {succ, any_num()}). 
in C = cell_start () , add_to_cell (any_num(), C). 

Now the uncoverability of the state where more than one instance 
of inc is running the protected section would prove that mutual 
exclusion is ensured for any number of concurrent copies of inc. 



6. Generating the Actor Communicating System 

The CFA algorithm we presented allows us to derive a sound 'flat' 
representation of the control-flow of the program. The analysis 
takes into account higher-order computation and (limited) informa- 
tion about synchronization. Now that we have this rough scheme of 
the possible transitions, we can 'guard' those transitions with ac- 
tions which must take place in their correspondence; these guards, 
in the form of 'receive a message of this form' or 'send a message 
of this form' or 'spawn this process' cannot be modelled faith- 
fully while retaining decidability of useful verification problems, 
as noted in Section 3. The best we can do, while remaining sound, 
is to relax the synchronization and process creation primitives with 
counting abstractions and use the guards to restrict the applicabil- 
ity of the transitions. In other words, these guarded (labelled) rules 
will form the definition of an ACS that simulates the semantics of 
the input Aactor program. 

Terminology. We identify a common pattern of the rules in Figure 3. 
In each rule R, the premise distinguishes an abstract pid T and an 
abstract process state q = (e, p, a, t) associated with'Ti.e. q 6 n(T) 
and the conclusion of the rule associates a new abstract process 
state — call it q — with'Ti.e. q G 7?'(£). Henceforth we shall refer 
to (T, q,q') as the active components of the rule R. 

Definition 8 (Generated ACS). Given a Aactor program V, a 
sound basic domains abstraction I — (T, M , T>) and a sound data 
abstraction for messages © rasg = {Msg, a msg , fes msg ) 

the Actor communicating system generated by V, I and T> msg is 
defined as 

Av ■= (Pid, ProcState, Msg, R,a(bo), a(ira(io))) 

where s-p = (ivq, p;o, 00, to) is the initial state (according to Init) 
with 7Tn = [to >-> (P, [], *, to)] and the rules in R are defined by 
induction over the following rules. 



(i) If s ~-» s 1 is proved by rule AbsFunEval or AbsArgEval or 
AbsApply with active components (T, q, q 1 ), then 

t: q —> q £ R (AcsTau) 

(ii) If s" s 1 is proved by AbsReceive with active components 
(i,q,q ) where d = {pi,'p) is the abstract message matched by 
mmatch and m £ fes msg (<7, d), then 

t: q q £ R (AcsRec) 

(iii) If 3" ~» s 7 is proved by AbsSend with active components 
(l, q,<t) where d is the abstract value that is sent and fh £ 
res rasg (cf, d), then 

t : q 5- q £ R 



(AcsSend) 

(iv) If s~ ~~> s' is proved by AbsSpawn with active component 
(T, q,q ) where H is the new abstract pid that is generated in the 
premise of the rule, which gets associated with the process state 
q" — (e, p, *) then 



t: q 



^q £R 



(AcsSp) 



As we will make precise later, keeping Pid and ProcState 
small is of paramount importance for the model checking of the 
generated ACS to be feasible. This is the main reason why we keep 
the message abstraction independent from the data abstraction: this 
allows us to increase precision with respect to types of messages, 
which is computationally cheap, and keep the expensive precision 
on data as low as possible. It is important to note that these two 
'dimensions' are in fact independent and a more precise message 
space enhances the precision of the ACS even when using Datao 
as the data abstraction. 

In our examples (and in our implementation) we use a Datao 
abstraction for messages where D is the maximum depth of the 
receive patterns of the program. 

Definition 9. The abstraction function 

Qfacs : State -> (Pid X {ProcState tbl Msg) -¥ N) 
relating concrete states and states of the ACS is defined as 

|{i | a[u) =t,a(-K(i)) = q}\ 



.00 : = 




, (res (a, p(L)i)) = 



where s = (n, p, a). 

It is important to note that most of the decidable properties 
of the generated ACS are not even expressible on the CFA graph 
alone: being able to predicate on the contents of the counters means 
we can decide boundedness, mutual exclusion and many other 
expressive properties. The next example shows one simple way in 
which the generated ACS can be more precise than the bare CFA 
graph. 

Example 2 (Generated ACS). Given the following program: 
letrec 

server = fun() — ► receive {init , P, X} — s- 

send(P, ok), deserve (X) 



end. 



do_serve = fun(X) 



■ receive 
{init 
{set 
{get 

end. 



_, _} — ¥ error ; 

Y} -> do_serve(Y); 

P} -* send(P.X), 

do_serve (X); 



our algorithm would output the following ACS starting from 
'main' : 4 

. — vi 3 , server _T S ! init ? k _ T s !set 
to: (mainj >Q >Q >Q >Q 

~ / i ?init ( s tb !ok , : — s ?init , n 

t s : I server I > deserve I > I receive I > I enor 



The error state is reachable in the CFA graph but not in its Parikh 
semantics: the token init is only sent once and never after ok is 
sent back to the main process. Once init has been consumed in the 
transition from 'server' to 'do_serve' the counter for it will remain 
set to zero forever. 

Theorem 3 (Soundness of generated ACS). For all choices of 
X and T> msg , for all concrete states s and s', if s — > s' and 
Qacj(s) < v then there exists v' such that a acs (s') < v', and 

V ->acs V'. 

See Appendix C for a proof of the Theorem. 

Corollary 1 (Simulation). Let At be the ACS derived from a given 
Aactor program V. We have fAvJ simulates the semantics ofV: 
for each V-run s —> Si —¥ S2 ■ ■ there exists a fA-pJ-run 
v — vi — s-acs V2 — >acs • • • such that ctacs(s) ~ v and for all i, 

aacs(si) < Vi. 

Simulation preserves all paths so reachability (and coverability) 
is preserved. 

Corollary 2. If there is no v > a acs (s') such that a a cs(s) — >-^ s v 
then s -fV s'. 

Example 3 (ACS Generated from Example 1). A (simplified) 
pictorial representation of the ACS generated by our procedure 
from the program in Example 1 (with the parametric entry point of 
Section 5) is shown in Figure 4, using a 0-CFA analysis. The three 
pid-classes correspond to the starting process To and the two static 
calls of spawn in the program, the one for the shared cell process % 
and the other, Ti, for all the processes running inc. 

The first component of the ACS, the starting one, just spawns 
a shared cell and an arbitrary number of concurrent copies of the 
third component; these actions increment the counter associated 
with states 'res_free' and 'inco'. The second component represents 
the intended protocol quite closely; note that by abstracting mes- 
sages they essentially become tokens and do not have a payload 
anymore. The rules of the third component clearly show its sequen- 
tial behaviour. The entry point is (To , celLstart) . 

The VAS semantics is accurate enough in this case to prove 
mutual exclusion of, say, state 'inc2', which is protected by locks. 
Let's say for example that n > processes of pid-class T reached 
state 'inci'; each of them sent a lock message to the cell; note 
that now the message does not contain the pid of the requester 
so all these messages are indistinguishable; moreover the order 
of arrival is lost, we just count them. Suppose that T c is in state 
'res_free'; since the counter for lock is n and hence not zero, the 
rule labeled with ?lock is enabled; however, once fired the counter 
for 'res_free' is zero and the rule is disabled. Now exactly one ack 
can be sent to the 'collective' mailbox of pid-class Ti so the rule 
receiving the ack is enabled; but as long as it is fired, the only ack 
message is consumed and no other T process can proceed. This 
holds until the lock is released and so on. Hence only one process 
at a time can be in state 'inc2'. This property can be stated as a 
coverability problem: can inc2 = 2 be covered? Since the VAS 
semantics is given in terms of a VAS, the property is decidable 



; spawn(server), send(S, {init , self () , a}), 
receive ok — > send(S, {set, b}) end. 



4 Labels are abbreviated to unclutter the picture; for example {init ,_,_} 
is abbreviated with init 



o 



^ . . T . . vi Q .resj'ree 

"iQ ; [ celLstart ] — ■ — > [ res_start J > [ sp_mc ] > [ stop ] 



?lock 



r c : (res-free) 




>[ incij 

?ack| 



, * t e !unlock 

fincsl > | stop 



r c !req 



@t c !req 7^3 
> I inc 3 J » I inc 4 J 



Figure 4. ACS generated by the algorithm from Example 1 



and the answer can be algorithmically calculated. As we saw the 
answer is negative and then, by soundness, we can infer it holds in 
the actual semantics of the input program too. 

Complexity of the Generation. Generating an ACS from a pro- 
gram amounts to calculating the analysis of Section 5 and aggre- 
gating the relevant ACS rules for each transition of the analysis. 
Since we are adding O(l) rules to R for each transition, the com- 
plexity of the generation is the same as the complexity of the anal- 
ysis itself. The only reason for adding more than one rule to R 
for a single transition is the cardinality of Msg but since this costs 
only a constant overhead, increasing the precision with respect to 
message types is not as expensive as adopting more precise data 
abstractions. 

Dimension of the Abstract Model. The complexity of coverabil- 
ity on VAS is EXPSPACE in the dimension of the VAS; hence for the 
approach to be practical, it is critical to keep the number of com- 
ponents of the VAS underlying the generated ACS small; in what 
follows we call dimension of an ACS the dimension of the VAS 
underlying its VAS semantics. 

Our algorithm produces an ACS with dimension ( | ProcState | + 
I Msg I ) x I Pid I . With the 0-CFA abstraction described at the end of 
Section 5, ProcState is polynomial in the size of the program and 
Pid is linear in the size of the program so, assuming \Msg\ to be a 
constant, the dimension of the generated ACS is polynomial in the 
size of the program, in the worst case. Due to the parametricity of 
the abstract interpretation we can adjust for the right levels of preci- 
sion and speed. For example, if the property at hand is not sensitive 
to pids, one can choose a coarser pid abstraction. It is also possible 
to greatly reduce ProcState : we observe that many of the control 
states result from intermediate functional reductions; such reduc- 



i letrec no_a = fun(X)— > case X of a — ► error ; b — > ok end. 

send.b = fun(P)-Miend(P, b), send_a(P). 

send_a = fun(P)->send(P, a), send_b(P). 

stutter = fun(F)— ^receive _ -> unstut(F) end. 

unstut = fun(F)— ^receive X — > F(X), stutter (F) end. 
6 in P = spawn(fun()— i>stutter(no_a)), send_a(P). 

Figure 5. A program that Soter cannot verify because of the se- 
quencing in mailboxes 



tions performed by different processes are independent, thanks to 
the actor model paradigm. This allows for the use of preorder re- 
ductions. In our prototype, as described in Section 7, we imple- 
mented a simple reduction that safely removes states which only 
represent internal functional transitions, irrelevant to the property 
at hand. This has proven to be a simple yet effective transformation 
yielding a significant speedup. We conjecture that, after the reduc- 
tion, the cardinality of ProcState is quadratic only in the number 
of send, spawn and receive of the program. 

7. Evaluation, Limitations and Extensions 

Empirical Evaluation. To evaluate the feasibility of the ap- 
proach, we have constructed Soter, a prototype implementation 
of our method for verifying Erlang programs. Written in Haskell, 
Soter takes as input a single Erlang module annotated with safety 
properties in the form of simple assertions. Soter supports the full 
higher-order fragment and the (single-node) concurrency and com- 
munication primitives of Erlang; features not supported by Soter 
are described in Remark 1. For more details about the tool see [11]. 
The annotated Erlang module is first compiled to Core Erlang by 
the Erlang compiler. A 0-CFA-like analysis, with support for the 
Datao data and message abstraction, is then performed on the 
compile; subsequently an ACS is generated. The ACS is simplified 
and then fed to the backend model-checker along with coverability 
queries translated from the annotations in the input Erlang pro- 
gram. Soter's backend is the tool BFC [18] which features a fast 
coverability engine for a variant of VAS. At the end of the verifi- 
cation pathway, if the answer is YES then the program is safe with 
respect to the input property, otherwise the analysis is inconclusive. 

In Table 1 we summarise our experimental results. Many of the 
examples are higher-order and use dynamic (and unbounded) pro- 
cess creation and non-trivial synchronization. Example 1 appears 
as reslock and Soter proves mutual exclusion of the clients' critical 
section, coned b is the example program of [ 1 6] for which we prove 
mutual exclusion, pipe is inspired by the 'pipe' example of [19]; 
the property proved here is boundedness of mailboxes, sieve is a 
dynamically spawning higher-order concurrent implementation of 
Erathostene's sieve inspired by a program by Rob Pike; 5 Soter can 
prove all the mailboxes are bounded. 

All example programs, annotated with coverability queries, can 
be viewed and verified using Soter at http : //mj olnir . cs . ox . 
ac.uk/soter/. 

Limitations There are programs and properties that cannot be 
proved using any of the presented abstractions, (i) The program 
in Figure 5 defines a simple function that discards a message in 
the mailbox and feeds the next to its functional argument and so 
on in a loop. Another process sends a 'bad argument' and a good 
one in alternation such that only the good ones are fed to the func- 
tion. The property is that the function is never called with a bad 
argument. This cannot be proved because sequential information of 
the mailboxes, which is essential for the verification, is lost in the 
counter abstraction, (ii) The program in Figure 6 defines a high- 
er-order combinator that spawns a number of identical workers, 
each applied to a different task in a list. It then waits for all the 
workers to return a result before collecting them in a list which is 
subsequently returned. The desired property is that the combinator 
only returns when every worker has sent back its result. Unfortu- 
nately to prove this property, stack reasoning is required, which is 
beyond the capabilities of an ACS. 

Refinement and Extensions. Our parametric definition of the 
abstract semantics allows us to tune the precision of the analysis 



5 see "Concurrency and message passing in Newsqueak", http : //youtu . 
be/hB05UFq0tFA 



Example 



LOC PRP SAFE? 



ABSTR ACS SIZE TIME 

D M Places Ratio Analysis Simpl BFC 



Total 



reslock 


356 1 


yes 





2 


40 


10% 


0.56 


0.08 


0.82 


1.48 


sieve 


230 : 


yes 





2 


47 


19% 


0.26 


0.03 


2.46 


2.76 


coned b 


321 1 


yes 





2 


67 


12% 


1.10 


0.16 


5.19 


6.46 


state_factory 


295 2 


yes 





1 


22 


4% 


0.59 


0.13 


0.02 


0.75 


pipe 


173 1 


yes 








18 


8% 


0.15 


0.03 


0.00 


0.18 


ring 


211 1 


yes 





2 


36 


9% 


0.55 


0.07 


0.25 


0.88 


parikh 


101 1 


yes 





2 


42 


41% 


0.05 


0.01 


0.07 


0.13 


unsafe_send 


49 1 


no 





1 


10 


38% 


0.02 


0.00 


0.00 


0.02 


safe_send 


82 1 


no* 





1 


33 


36% 


0.05 


0.01 


0.00 


0.06 


safe_send 


82 4 


yes 


1 


2 


82 


34% 


0.23 


0.03 


0.06 


0.32 


firewall 


236 1 


no* 





2 


35 


10% 


0.36 


0.05 


0.02 


0.44 


f i rewa 1 1 


236 1 


yes 


1 


3 


74 


10% 


2.38 


0.30 


0.00 


2.69 


finitejeader 


555 1 


no* 





2 


56 


20% 


0.35 


0.03 


0.01 


0.40 


finite_leader 


555 1 


yes 


1 


3 


97 


23% 


0.75 


0.07 


0.86 


1.70 


stutter 


115 1 


no* 








15 


19% 


0.04 


0.00 


0.00 


0.05 


howait 


187 1 


no* 





2 


29 


14% 


0.19 


0.02 


0.00 


0.22 



Table 1. Soter Benchmarks. The number of lines of code refers to the compiled Core Erlang. The PRP column indicates the number of 
properties which need to be proved. The columns D and M indicate the data and message abstraction depth respectively. In the "Safe?" 
column, "no*" means that the program satisfies the properties but the verification was inconclusive; "no" means that the program is not safe 
and Soter finds a genuine counterexample. "Places" is the number of places of the underlying Petri net after the simplification; "Ratio" is the 
ratio of the number of places of the generated Petri net before and after the simplification. All times are in seconds. 



i letrec 

worker^ fun(Task) 

4 spawn_wait= fun(F, L) — > spawn_wait'( F, fun()— >[], L). 
spawn_wait'= fun(F, G, L) — > 



6 case L of 

[] -> GO; 

[T[Ts] -> 

9 S = self(), 

10 C = spawn(fun() — > 

11 send(S, {ans, self () , F(T) })), 

12 F' = fun() — > 

13 receive 

{ans, C, R} -> [ R G() ] 

15 end, 

16 spawn_wait'( F, F', Ts) 
n end. 

18 



19 in spawn_wait(worker, [taskl, task2, ...]). 



Figure 6. A program that Soter cannot verify because of the stack 



when the abstraction is too coarse for the property to be proved. For 
safety properties, the counter-example witnessing a no-instance is a 
finite run of the abstract model. We conjecture that, given a spurious 
counter-example, it is possible to compute a suitable refinement 
of the basic domains abstraction so that the counter-example is no 
longer a run of the corresponding abstract semantics. However a 
naive implementation of the refinement loop would suffer from 
state explosion. A feasible CEGAR loop will need to utilise sharper 
abstractions: it is possible for example to pinpoint a particular pid 
or call or mailbox for which the abstract domains need to be more 
precise while coarsely abstracting the rest. The development of a 
fully-Hedged CEGAR loop is a topic of ongoing research. 

The general architecture of our approach, combining static anal- 
ysis and abstract model generation, can be adapted to accommodate 
different language features and different abstract models. By appro- 
priate decoration of the analysis, it is possible to derive even more 
complex models for which semi-decision verification procedures 
have been developed [4, 21]. 



8. Related Work 

Static Analysis. Verification or bug-finding tools for Erlang [6- 
8, 20, 22, 27] typically rely on static analysis. The information 
obtained, usually in the form of a call graph, is then used to extract 
type constraints or infer runtime properties. Examples of static 
analyses of Erlang programs in the literature include data-flow [6], 
control-flow [20, 27] and escape [7] analyses. 

Van Horn and Might [25] derive a CFA for a multithreaded ex- 
tension of Scheme, using the same methodology [33] that we fol- 
low. The concurrency model therein is thread-based, and uses a 
compare-and-swap primitive. Our contribution, in addition to ex- 
tending the methodology to Actor concurrency, is to use the derived 
parametric abstract interpretation to bootstrap the construction of 
an infinite- state abstract model for automated verification. 

Reppy and Xiao [31] and Colby [9] analyse the channel commu- 
nication patterns of Concurrent ML (CML). CML is based on typed 
channels and synchronous message passing, unlike the Actor-based 
concurrency model of Erlang. 

Venet [34] proposed an abstract interpretation framework for 
the sanalysis of 7T-calculus, later extended to other process algebras 
by Feret [12] and applied to CAP, a process calculus based on the 
Actor model, by Garoche [15]. In particular, Feret's non-standard 
semantics can be seen as an alternative to Van Horn and Might's 
methodology, but tailored for process calculi. 



Model Checking. Huch [16] uses abstract interpretation and 
model checking to verify LTL-definable properties of a restricted 
fragment of Erlang programs: (i) order-one (ii) tail-recursive (sub- 
sequently relaxed in a follow-up paper [17]), (iii) mailboxes are 
bounded (iv) programs spawn a fixed, statically computable, num- 
ber of processes. Given a data abstraction function, his method 
transforms a program to an abstract, finite-state model; if a path 
property can be proved for the abstract model, then it holds for 
the input Erlang program. In contrast, our method can verify Er- 
lang programs of every finite order, with no restriction on the size 
of mailboxes, or the number of processes that may be spawned. 
Since our method of verification is by transformation to a decid- 
able infinite-state system that simulates the input program, it is 
capable of greater accuracy. 



McErlang is a model checker for Erlang programs developed by 
Fredlund and Svensson [14]. Given a program, a Biichi automaton, 
and an abstraction function, McErlang explores on-the-fly a prod- 
uct of an abstract model of the program and the Biichi automaton 
encoding a property. When the abstracted model is infinite-state, 
McErlang's exploration may not terminate. McErlang implements 
a fully-fledged Erlang runtime system, and it supports a substantial 
part of the language, including distributed and fault-tolerant fea- 
tures. 

ACS can be expressed as processes in a suitable variant of 
CCS [26]. Decidable fragments of process calculi have been used 
in the literature to verify concurrent systems. Meyer [23] isolated 
a rich fragment of the 7r-calculus called depth-bounded. For cer- 
tain patterns of communication, this fragment can be the basis of 
an abstract model that avoids the "merging" of mailboxes of the 
processes belonging to the same pid-class. Erlang programs how- 
ever can express processes which are not depth bounded. We plan 
to address the automatic abstraction of arbitrary Erlang programs 
as depth-bounded process elsewhere. 

Bug finding. Dialyzer [7, 8, 20] is a popular bug finding tool, in- 
cluded in the standard Erlang / OTP distribution. Given an Erlang 
program, the tool uses flow and escape [29] analyses to detect spe- 
cific error patterns. Building on top of Dialyzer's static analysis, 
success types are derived. Lindahl and Sagonas' success types [20] 
'never disallow the use of a function that will not result in a type 
clash during runtime' and thus never generate false positives. Di- 
alyzer puts to good use the type annotations that programmers do 
use in practice; it scales well and is effective in detecting 'discrep- 
ancies' in Erlang code. However, success typing cannot be used to 
verify program correctness. 

Conclusion. We have defined a generic analysis for Aactor, and 
a way of extracting from the analysis a simulating infinite-state ab- 
stract model in the form of an ACS, which can be automatically 
verified for coverability: if a state of the abstract model is not cov- 
erable then the corresponding concrete states of the input Aactor 
program are not reachable. Our constructions are parametric on the 
abstractions for Time, Mailbox and Data, thus enabling different 
analyses (implementing varying degrees of precision with different 
complexity bounds) to be easily instantiated. In particular, with a 
0-CFA-like specialisation of the framework, the analysis and gen- 
eration of the ACS are computable in polynomial time. Further, the 
dimension of the resulting ACS is polynomial in the length of the 
input Aactor program, small enough for the verification problem 
to be tractable in many useful cases. The empirical results using 
our prototype implementation Soter are encouraging. They demon- 
strate that the abstraction framework can be used to prove inter- 
esting safety properties of non- trivial programs automatically. We 
believe that the proposed technique can easily be adapted to ac- 
commodate other languages and other abstract models. The level 
of generality at which the algorithm is defined seems to support the 
definition of a CEGAR loop readily, the formalisation of which is 
a topic for future work. 
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A. Abstract Domains, Orders, Abstraction 

Functions and Abstract Auxiliary Functions 

Abstract Domains, Orders and Abstraction Functions: 
Pid := ProgLoc x Time 
< P id := = x < t 
«p id := id x a t 
VAddr :— Pid x Var x Data x Time 
<va := <pid x = x < d x <, 
Qva := Qpid x id x Q d x a, 
Env .— Var — 1 VAddr 

p <cnv p Va; G Var . p(x) < va p\x) 

ci^{p){x) := Qva(/o(a;)) 
KAddr :— Pid x ProgLoc x Pn-u x Time 



where « = Arg 4 ( 



,P,-} 



^ka - — ^pid X — X ^env X <Cj 

Qka := ctpid x id x a, 
Closure :— ProgLoc x Pnu 



x a, 



f^cl — X ^env 

Oci id x a em 
Value .— Closure t±J Pid 

<val : = <cl + <pid 
Qval — Ocl + Opid 

ProcState := (ProgLoc t+J Pid) x Pnt; x KAddr x Time 

^ [is 



<„ 



(= + <p.d)X < e „v X < ka X <, 

a ps := (id + ap id ) x a env x a ka x a, 
Procs := Pid -» ^(ProcSTate) 

7T <proc 7T Pid . 7? (t) C 7? ( t) 

Q P roc S (7r)Cr) := {a P s(n(o)} | a p id(t) = t} 
Mailboxes := Pid — > Mailbox 

V- <m S p' <=> VTG Pid . p(T) < m 
a m!i (p)(T) : = \_\{ce m (p(i)) | a p id(t) =T} 

Store := (WWcTr -> 3»{ Value)) x (it^ddr -> &(Kont)) 
9 < st a' <=> V6 G V55rfr . 5(b) C a'(b) 
Va G KAddr. o(a) C o='(a) 
a s ,(a)(S) := {avai((T(6)) | a va (6) tMddr 
a st (cr)(a) := {akont(ff(a)) | a ka (a) = a }, a G KAddr 

State := Procs x Mailboxes x Store 



< :=<„ 



X <ms X < s , 



Qcfa := (id + Qpid ) X Q env 

where we write f + g :— {(x, x') \ (x, x') G / or (a;, x') G g}. 
Abstract Auxiliary Functions: 

newkp^sh : Pid x ProcState — > KAddr 

nSw^Ish(t,(£,p,-,t)) ■= (T,£.aig ,p,t) 

riewkp^p : Pid x Kont x ProcState KAddr 
riewk^p(T,/5, (_,_,_,*)) := (T, £arg i+1 , 



new^: Pid x Var x Doia x ProcState — > VAddr 
new™(r, x, 5, (_, _, _, t )) := (T", k, 5, t ) 
flewpTd : Pid x ProgLoc x Time — 5* Pid 
fiewpld((£',?),^t) := (^ticlT(£tick(£',?)) 
Concrete and Abstract Match Function: 
match p , CT (pi, (x,p)) = match p , CT (pi, a(p'(x))) 
match p , CT (a;, d) = {x h-s> d} if x ^ dom(p) 
match Pl(T (a;, d) ={ih>d} if matchp'. CT (p', d) 7^ _L 

where (p , p ) = a(p(x)) 
match p , CT (p, (t, p')) = (^) matchp, CT (pi, (ti,p')) 

l<i<n 

where p = c(pi, . . . ,p„) 

t = c(ti, . . . ,t n ) 
0ig>0' = ± if 3x.0(a;) / 0'(a;) 

6®6' = 8U6' otherwise 



matchp. CT (p, d) = _L otherwise 
matchp, ff (pi, (x,p)) = [J matchp,5(p,, d ) 
matchp i s(j:, d) = {x t-¥ d } 

match^(p, (f)p')) = (^) matchp iS (p l , (ii, p)) 

l<i<n 

if p = c(pi, . . . ,p„) and 
£ = c(ti, . . . , t n ) 

l<i<n 

%e@i,l<i<n . 



where (g)({6 l 1 < i < n})= |0 
matches (p, d) = otherwise 



Lemma 1. Suppose the concrete domain C = A — 1 B of partial 
functions has abstract domain C = A — S- &(B) with the induced 
order < anrf abstraction function otc '■ C C as specified in 5 
then for all f G C and for all ctc(f) < / 

Va G dom(/) . «b(/(o)) G /(a A (a)). (6) 

Further suppose f, f G C .suc/i fAaf /' = /[ai H- 61 , . . . , a n 1— > 
b n ] and let f, f G C such that f = f U [ai >-> bi, . . . , a n >-> b n ] 
with ctc(f) < / a/id QA(a^) = ai, as(6i) = &i for i = 1, . . . , n 
then 



ac(f') < f. 



(7) 



Proof. Let / G C and / G C such that ac(/) < /• The definition 
of < implies that for all a G A 

ac(f)(a) C /(a). 
Take a G A and fix a = qa(o) then we obtain 

ac(f)(a A (a)) C /(a A (a)). 



Expanding the definition of ac yields 

{«b(6o) I (ao,&o) 6 /, aA(ao) = QA(a))} C /(a A (a)). 

In particular a B (/(a)) £ {qb(&o) | (<Jo,&o) G /, QA(ao) = 
<*a(i))} which yields what we set out to prove 

Va £ dom(/).OB(/(a)) £ t(ot A (a)). 

Turning to equation 7 we want to show ac(f') < /'• Let a £ A 
then there are several cases to consider 

(i) ac (f)(5) ~ Q c(/)(a). Then since ac(f) < / < /' we have 
ac(/)(a)C/(a)C/'(a). 

(ii) a = aA(oi) for some 1 < i < n. Then a = Sj and thus 
a c (/')(a) = {a B (6i)} U {a B (/(a))|a A (a) = a, a ^ a 4 } 

C {b,}Uac(/)(S) 
C{b I }u/(a)C/'(a) 

(iii) otherwise there does nof exist (a, 6) £ /' such that a A (o) = a 
and hence ac(f')(a) = which makes our claim trivially true. 

We can thus conclude that ac(f') < /'■ □ 

Corollary 3. Let tt £ Procs and 7? £ Procs such that ct pmc (n) < 
7?, let a £ Store and a £ Store such that a„(<j) < a and Let 
p £ Mailboxes and p, £ Mailboxes such that a ms (p) < /I f/ierc 

(0 Vt £ Pid . a ps (7r(t)) £ 7f(a P id(b)) 
(ii) Vb £ V4ddr . a va i(cr(b)) £ <?(a va (&)) 
fr'i'/J Va £ KAddr . at on ,(a(a)) £ cr(afc,(a)) 
fiv) Vt £ Pid . a m (p(t,)) < fi(a P id{L)) 

(v) Vt £ Pid . Vx £ Var . V5 £ Data . Vg £ ProcState . 

a V a(new va (t, x, 5, g)) = new^(a p ;d(t), a, ay(<5), a pJ (g)) 

Proof. Cases (i) - (iii) follow directly from Lemma 1 ; it remains to 
show the claims of (iv) and (v). 

(iv) By assumption a ms (p) < p, which implies that 

a mB (fx)(a f m(i)) < u(a p u(t,)) = jEt(t). 

Expanding a ms then gives us that a m (p(i)) < a ras (p)(a P id(t)), 
since a ms (p) = XT. |_|{a m (/i(t)) | a p id(t) = t}, which allows us 
to conclude 

a m (p,(t)) < p(T). 

(v) The claim follows straightforwardly from expanding new va and 
new va : 

a V a(new va (t, x,5, qj) = (a pid (t), x, ad(o), a t (t)) 

new^ (a p id ( t) , x , «d (S) , a ps (g) ) 

where g = (e, p, a, t). 

□ 

B. Proof of Theorem 1 

Proof of Theorem 1. The proof is by a case analysis of the rule that 
defines the concrete transition s —¥ s' . For each rule, the transition 
in the concrete system can be replicated in the abstract transition 
system using the abstract version of the rule, with the appropriate 
choice of abstract pid, the continuation from the abstract store, the 
message from the abstract mailbox, etc. 

Let s — {it, (i, o~) — > (tt', p , a') = s' and it = (n, p, a) such 
that a P io C (7r) < 7?, a mii [\(p) < p,, and a st (a) < a. We consider a 
number of rules for illustration. 



Case: (Send). We know that s — > s' using rule Send; we can thus 
assume 

7r(t) = (v,p,a,t) 

a(a) = Arg 2 {£,d, t',_, c) 

d — (send, _) 

and for s' 

7T = Tv[t I — y {v, p, C, t}] 

p! = p[i enq((u, p), p(t'))] 
i 

a = o. 

As a first step we will examine it and show that u it' for some 
it'. For 7? and <r, writing T:= a p id(t), Corollary 3 gives us 

(v,p,a,t) £ 9(T) 

Arg 2 {£, d,t ,_,c) £ d (a) 

where a C nv(p) = p, Qak(a) = a, d = a V ai(d), t = a,(t), 
T = ctpid(t') and c = 0^(0). Rule AbsSend is now applicable 
and we can set 

7r' '. — 7T I I JzT 1 ^ gj 

g := (v,p,c,t) 

p! := p[t 1 ^ enq((u, p), p(t'))] 

u := (7T ,/i ,cr). 

It follows from rule (AbsSend) that it *»* it'. It remains to show 
that a c f a (s') < u' which follows directly from (i) Q pro c(7r') < 7?' 
and (ii) a ms (p) < p! . 

(i) a P roc(7r') < 7?' follows immediately from Lemma 1 since 
T= Q P id(t) and a ps (g) = g. 

(ii) ams(p') < m'- It is sufficient to show that Q p id(t') = V , which 
is immediate, and a m (p'(t')) < p(T). For the latter, since 
ctenv(p) = p, a sound basic domain abstraction gives us 

a m (n'(t')) = a m (enq((«,p),/j(t'))) 

< eSq((«,io),i2(t')) = p(T) 

provided we can show a m (p(i')) < /I(t'); the latter inequality 
follows Corollary 3. Hence we can conclude a m! ,(p') < p! 
which completes the proof of this case. 

Case: (Receive). In the concrete s —¥ s using the Receive, hence we 
can make the following assumptions 

7r(t) = (receive p± — > e± . . . p„ — > e n end, p,a,t) =: q 

(i,6,m) = mmatch(pi . . . p n , p(t), p,a) 

6 = [xi n> di . . . x k M> d fc ] 

bj = new va (t,Xj,5j,g) 

5 j = res (a,dj) 

and for state s' 

tt' = 7r[t h-> g'] 

9 = (ei,p',a,t) 

p = p[a;i i-> &i . . . zj. i-> 6fe] 

/1 = jti[t h-> m] 

a' = o[bi i-> di . . . 6fe i-> dfc]. 

As a first step we will look at it to prove there there exists a it' 
such that u ^* u using rule AbsReceive. We can invoke Corollary 



3, since a pr0 c(7r) < 5f, to obtain 

q := (receive p\ — > e\ . . . p„ — > e n end, p, a, t ) € 7?(t) 

where we write p := « C nv(p), a : = Qka(a), i = <*t(i) and 
t":= apid('-)- Moreover Corollary 3 gives us 

Qm(p(t)) < p(T). 

as a m s(p) < A* and r= a P id(t). Since the instantiation of the basic 
domains is sound and a it (a) < a we then know that 

(i, 9, fa) £ mmatch(p, p,(T),p, a) 

such that 8 — a su b(0) and tfi > a m (m). Turning to the substitution 
6 we can see that 

6 = [x\ d\ . . . Xk !-> rffc] 

where di = a(di) for 1 < i < k. Appealing to the sound basic 
domain instantiation once more, noting that a st (cr) < a, yields 
that for j — l,...,k we have <5j := otd(5j) € res(a,dj); 
to obtain new abstract variable addresses we can now set bj := 
newra(T, Xj, 8j, q ). Rule AbsReceive is applicable now; we make 
the following definitions 

7r' := 7T U q'] 

9 : = {ei,'P,a,t) 

p := p[ii &! . . . x fe h-> 6 fc ] 

/i' := ju[T>— > fn] 

S' := J U [6i i-> di . . . 6^ h> d fc ] 
S' := {9' ,p,' ,d' 

and observe that it ~> it'. It remains to show Ocfa(s') < it' which 
follows directly if we can prove (i) a pro c(vr') < 9', (ii) a ms (p') < p' 
and (iii) a st (cr') < <r', 

(i) a pr0 c(7r') < 7r'. We note that by Corollary 3 we know bj = 

Qva(fei) as t = a P id(t). & = «d(<5i) and a pro c(q) = q for 
1 < i < n. It follows that pi = a cn v(p') and hence q — 
a ps (q'). Lemma 1 is now applicable, since T= a P id(t)> to give 
a P roc(7r') < 5?'. 

(ii) a m s(/i') < It is sufficient to show thatT = a P id(t), which is 
immediate, and a m (m) < fix which we have already established 
above; hence we can conclude a ms (p') < p! . 

(iii) a s t(ff') < a'. The observation that bi = a m (bi) and d; = 
Oval {di ) allows the application of Lemma 1 which gives a st (cr ) < 
a' as desired. 

This completes the proof of this case. 

Case: (Apply). Since s — > s' using rule Apply we can assume that 

7r(i) = (v,p,a,t) =: q 
a(a) = Arg„{£,d . . .d n -i,p',c) := k 
arity(£) = n 

d = (fun(xi . . . x n ) ->• e, p ) 
d n = (u, p) 

and for i = 1, . . . , n 

Si = res (a, di) 

bi = new va (t, Xi,Si,q) 

additionally for the successor state s' 

ir' = 7r[i i— > q] 



where q := (e, p'[x\ — > bi . . .x n — > bn], c, tick(€, t)) 
cr' = it[&i i — y di ■ • • bn I — ^ dn] 

y! = H 

As a first step we will examine u and show that there exists a it' 
such that u u' using rule AbsApply. From Corollary 3, since 
a P roc(7r) < 7?, it follows that 

q := (v, a env (p),aka(a),at(i)) £ 7?(a pid (4))- 

Letting p := a en v(p), a := Qk a (a), t := a t (t) and T := a p id(t) we 
can appeal to Corollary 3 again, as a st (a) < 3, to obtain 

Arg n {£, do... d n -i,p,c) G a(a) 

where we write di := a va i(rfi) for < i < n, pi :— a cnv (p') and 
c := Qka(c). Expanding a va i yields 

do = (fun(a;i . . . x n ) — > e, job) 

where we write po := a em (po). Taking d„ := (v, p) we obtain 
from our sound basic domain abstraction 

Si := ad(<y £ fes(cr, di) for i = 1, . . . ,n 

as a st (a) < a and di = a V ai(di). Turning to the abstract variable 
addresses we define 

bi :— flewra(r, Xi, Si, q) for 1 < i < n. 

Rule AbsApply is now applicable and we define 

7? := 7T U [Ti— > q] 

q ■— {e,p/[xi -> 6i . . . iE» -> b n ], c,tick(f,f )) 

cr := ct U [ &i i — ^ di . . . fo n I — y dn] 

U ■= (7T ,p,cr ). 

It is clear from rule AbsApply that it v» u'\ it remains to show that 
otcfz(s') < it' to prove this case. The latter follows if we can justify 
(i) a P roc(7r') < w' and (ii) a st (cr') < a'. 

(i) a P roc(7r') < 7?'. We can appeal to Lemma 1 provided we can 
show that T = a p id(t) and Q ps (q') = q where the former 
is immediate. For the latter, first observe that since we have 
a sound basic domain abstraction we know a t (tick(£, t)) < 
tick(^, t ); however as Time is a flat domain so the above 
inequality is in fact an equality 

a,(tick(^,t)) = tick(£,t). 

Moreover bi — a va (bi) for 1 < i < n by Corollary 3, hence 

aem{p'[xi — > bl . . . Xn ~ > b n \) = f> [xi — > fel . . . Xn — > bn] 

as aenv(p') = p'; in combination with c = Oka(c) we obtain the 
desired a ps (q') — q 1 . Thus we conclude that a proc (7r') < 7?'. 

(ii) a st (cr') < a'. Since a V a(bi) = bj and a V ai(di) = di for 
1 < i < n Lemma 1 is applicable once more and gives us 
a s t (cr') < a' and completes the proof of this case. 

□ 

C. Proof of Theorem 3 

Terminology. Analogously to our remark in section 6 on active 
components for rules AbsR in the abstract operational semantics it 
is possible to identify a similar pattern in the concrete operational 
semantics. Henceforth we will speak of the concrete active compo- 
nent (t, q, q) of a rule R of the concrete operational semantics and 
we will say the abstract active component (T, q, q 1 ) of a rule AbsR 
of the abstract operational semantics where AbsR is the abstract 



counterpart of R. We will omit the adjectives abstract and concrete 
when there is no confusion. 

Lemma 2. Suppose s — > s using the concrete rule R with concrete 
active component (i, g, g ) and s > Q c / fl (s). Then 's s with 
s > ctcfa(s') using rule AbsR with abstract active component 
(a pid (i,) , a ps (q) , a ps (q')). 

Proof. The claim follows from inspection of the proof of Theo- 
rem 1. □ 

Proof of Theorem 3. Suppose s — > s using rule R of the concrete 
operational semantics with active component (t, g,g'). We will 
prove our claim by case analysis on R. 

- R = FunEval, ArgEval, Apply or Vars. Take s = a c f a (s); 
Lemma 2 gives us that s s 1 using abstract rule AbsR = 
AbsFunEval, AbsArgEval, AbsApply or Vars respectively with 
active component (t, q, q 1 ) where T = a p a(i), Q = a ps(<7) and 
q = « ps (g'). It follows that 

r := i: q — > q £ R. 

Since s — > s' with active component (l, q, q') it follows that 

(i) a acs {s){t,q) > 1, 

(ii) « acs (s')(T, g) = aac S (s)(T, g) - 1 and 

(iii) a acs (s')(^ <?) = «acs(s)Cr, <f) + 1 

as T = a P id((.), g = a ps(q) and g^ = a ps (g'). We know 
Oacs(s) < v and thus 

vfcg) > 1. 

If we define 

v' := v[(t, g) H> v(T, g) - 1, (l, q) n> v(T, q) + 1], 

then it is clear that v — s> acs v' using rule r £ R and the 
inequalities 

aacs(s')(£ ?) = "ac^s)^, (?) - 1 < v(r, q) - 1 = v'(r, g) 
a acs (s')(T,g') = ctac S (s)(T,g ) + 1 < v(r,g ) + 1 = v'(T,g ); 

the consequence of the latter two is that a aC s(s') < v', since 
aacs(s) < v, which completes the proof of this case. 

- R = Receive. Letting s" = a c f a (s) Lemma 2 yields that s" s 1 
using abstract rule AbsReceive with active component (f, q, q') 
where t = a P id(t), q = a ps (g) and q 1 = a ps (g'). We note that 
s = (n,o,p) and s = (n,o,p) where 7? = a pr0 c(7r), a = 
a st (cr) and £t = a ms (p). Let the message matched by mmatch 
and extracted from p(t) be d = (pi,p') then inspecting rule 
AbsReceive we can assume that during s" ~-» s 1 message d = 

(pi,/?), where f> = a e nv(p')' ' s matched by mmatch. Since 
the message abstraction is a sound data abstraction we know 
that 

m := a msg (res(cr, d)) £ res msg (<7, d ) 
and hence we have 



c: q 



— > q £ it 



Additionally we know 

(i) Oacs (s)(T, g) > 1, 

(ii) aacs(s)(T, m) > 1, 

(iii) aa CS (s')(^$') = «ac S (s)(r, g) - 1, 

(iv) a ilcs (s')(?,q > ) = a acs (s) (T, q ') + 1 and 

(v) a acs (s')(r, m) = « acs (s)Cr, m) - 1 

since d is the message extracted from p(i) and m = a msg (res((j, d)). 
By assumption we know a acs (s) < v which implies 

v (<-> ?) > 1 ar, d v(T, m) > 1 



and so we can define 

(T,g) M> v(t,q) - 1,' 
("T, m) i-> v("T, m) — 1, 
. (t", 9 ) i-> v^g 7 ) + 1 _ 
it is then clear that, using rule r 6 R, v — > acs v' and 

aac S (s')(T, q) = aac S (s)(r, §) - 1 < v(T, q) - 1 = v'(T, g) 
aacs(s')(T, m) = a a cs(s)('T, m) — 1 < v(T, m) — 1 = v'fT, m) 
aacs(s')K5') = «acs(s)Cr, g 7 ) + 1 < v(t, g') + 1 = v'CT,^). 

Hence, since a acs (s) < v, we can conclude a acs (s') < v' as 
desired. 

R = Send. Using Lemma 2, with s = a c f a (s), gives s ^* s 1 with 
active component (t", q, ~q) for the abstract rule AbsSend where 
t = Q P id(t), g = Q ps (g) and g 7 = a ps (g')- Examining the 
concrete and abstract states we see s = (ir,o-,p) and s" = (7? , 



a, /i) where 7? : 



c(7r), (7 = a s i(cr) and /2 = a ms (p). Let the 



pid of the recipient be t! and let d be the value enqueued to t''s 
mailbox /i(t'); inspecting the proof of Theorem 1 the pid of the 
abstract recipient is H := a p id(i') and the sent abstract value 
is d — a V ai(d). Appealing to the soundness of the message 
abstraction we obtain 

m := « msg (res(cr, d)) £ fes msg (3 : , d ) 

and hence we have 



r := a: q 



— >• q £ R 



Additionally we know 

(i) a acs (s)(t, q) > 1, 

(ii) a acs (s')(i,q) = aa CS (s)(T,q) — 1, 

(iii) a acs (V)(Ti q 1 ) = a acs (s)(T, q 1 ) + 1 and 

(iv) a acs (s')(t',m) = aac«(s)(Z',fn) + 1 

since d is the message enqueued to /«(«.') andm = a msg (res (a, d)). 
From our assumption we know o acs (s) < v and thus 

v(t,q) > 1; 

making the definition 

&q) !->■ v(T,g) - 1," 

(tg) h- > v(T,g ) + 1, 

(r 1 , m) v(t , m) + 1 

we observe that we are able to use rule r £ R to make the step 
v — > acs v'. Further the inequalities 

a acs (s')(t,q) = Q acs (s)(r,g) - 1 < v(c,q) - 1 = v'(r,g) 

aac^s')^, g^) = aac S (s)(T, g 7 ) + 1 < v(r, g') + 1 = v'(T,q) 

a ms (s')(Tf ,fh) = a ac s(s)(t', m) — 1 < v^jfn) — 1 = v'^m). 

imply, since a acs (s) < v, that aacs(s') < v' which concludes 
the proof of this case. 

R = Spawn. Take s = a c f a (s); Lemma 2 gives us that s ~-» s' 
using abstract rule AbsSpawn with active component ("T, g, g 7 ) 
where T = a p id(t), g = a ps (g) and g' = a ps (g'). We note 
that s — {it, a, p), s' = (n' , a' , p!) and ~s = (tt, ct, /i) where 
(it), a — a st (a) and p, = a ms (p). Further we can 

assume 

?r(t) = (fun() -> e,p,a,t) 
a (a) = Argi(£,d, p',c) 
d = (spawn, _) 

1 :— new p id(t, £, ■d) 



tt'O) = (i',p',c,t) = q 
tt'(l) = (e, p, *,to) —: q" 

Noting that we are replicating the step s —¥ s' in the abstract 
? ~-» s 1 , a c ta(s) = s and Qpid o new p ij = new p i,j o a we can see 
that the new abstract pid created is H = Qpid(t') together with 
its process state q" = a ps (q"). Hence we can conclude that 



r :— l: q >• q £ H 

and we observe that 

(i) aacs(s)(T, ?) > 1, 

(ii) aacs(s')(t, ?) = aacs - 1, 

(iii) a a cs(s')(T,g / ) = aacs(s)(t", g 7 ) + 1 and 

(iv) o«,(a')(?,9") = «") + !• 
Now the assumption a acs (s) < v allows us to conclude 

v(t,<7) > 1; 

so that we can define 

(T,g) h-> v(T,g) - 1, " 

(T,g) >->■ v(T,g) + 1, 

_(^,g") \r(i ,q") + 1 

and use rule r € flto make the step v — > acs v'. Further with 
the inequalities 

aacs(s')(r,g) = Q acs (s)Cr, g) - 1 < v(T, g) - 1 = v'(T, q) 

ct xs {s'){?,q) = a xs (s)(T,q) + 1 < v(T, <f) + 1 = v'(T,q) 

s){L,q ) = aacs («)((- ,g )-l<v(t,g )-l = v(t,g ). 



and our assumption a acs (s) < v we see that a acs (s') < v' 
which completes the proof of this case and the theorem. 



□ 



